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REVIEW 

Lessons from next-generation sequencing analysis 
in hematological malignancies 

E Braggio^ JB Egan^'^ R Fonseca^ and AK Stewart^ 

Next-generation sequencing has led to a revolution in the study of hematological malignancies with a substantial number of 
publications and discoveries in the last few years. Significant discoveries associated with disease diagnosis, risk stratification, clonal 
evolution and therapeutic intervention have been generated by this powerful technology. As part of the post-genomic era, 
sequencing analysis will likely become part of routine clinical testing and the challenge will ultimately be successfully transitioning 
from gene discovery to preventive and therapeutic intervention as part of individualized medicine strategies. In this report, we 
review recent advances in the understanding of hematological malignancies derived through genome-wide sequence analysis. 
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INTRODUCTION 

Over the past few years, a remarkable effort has been underway to 
identify the genetic basis of hematological malignancies catalyzed 
by increasing availability and more refined sequencing techno- 
logies (Figure 1). The growth in high-throughput sequencing, 
which has facilitated this effort, has been exponential with 
a dramatic increase in efficiency and correlating drop in price 
per base pair. Modern platforms can now perform whole-genome 
sequencing (WGS) of an individual for less than $5000 and a few 
days of work; notable progress compared with the resources and 
time that were used just a few years ago by an international 
consortium when completing the first human genome.^ Strikingly, 
this technological revolution occurred in <10 years. 

A representative example of how increasingly more powerful 
technologies continuously improve our knowledge of the genetic 
basis of hematological malignancies comes from the study of the 
genome of an acute myeloid leukemia (AML) patient with normal 
cytogenetics, which was studied twice in a period of 2 years. The 
initial WGS analysis study only revealed small insertions and 
deletions affecting two genes, and nonsynonymous somatic 
mutations, in another eight genes.^ Two years later, the same 
genome was resequenced utilizing more advanced sequencing 
technology and analytical methods resulting in the detection of 
a previously unidentified frameshift deletion in DNMTSAr' After 
the initial discovery, DNMTSA mutations were screened in large 
cohorts and it is now known that 22-30% of AML patients have 
mutations in this gene, being currently one of the most relevant 
and potentially targetable mutations found in AML. 

Next-generation sequencing (NGS) encompasses several 
different methodologies that allow the investigation of genomics, 
transcriptomics and epigenomics. A summary of the different 
sequencing approaches is briefly described below and summa- 
rized in Table 1. For more in-depth information, we direct the 
reader to a number of excellent reviews.'^"^^ 



WHOLE-GENOME SEQUENCING 

Two major approaches are utilized in the preparation of DNA 
libraries for WGS. The first is called paired-end sequencing, where 
~ 100 bp are sequenced from each end of ~400-bp DNA 
fragments. By this method, single nucleotide variants (SNV), 
insertions and deletions and copy-number changes can be 
identified. Paired-end WGS needs low-input quantities of DNA 
(<^ [ig] for generating the libraries, which is a critical advantage 
in the study of hematological malignancies, where the amount of 
tumor tissue is usually scarce. 

The second approach used in the DNA library preparation is 
named mate-pair sequencing. Mate-pair is based on the generation 
of much larger DNA fragments than paired-end sequencing 
with fragments ranging in length from 1 to lOkb. Longer 
distance between the read pairs enables improved detection of 
structural rearrangements because the read pairs can span repeat 
and duplicate regions, thereby capturing regions not adequately 
captured with smaller insert sizes utilized with paired-end 
sequencing. Very low coverage of the genome is enough for 
studies focused on the detection of structural abnormalities, thus 
reducing costs and complexity of the analysis. On the other hand, if 
the genome is covered in enough depth (>30X mean coverage), 
mate-pair sequencing can be used for the simultaneous detection 
of mutations, copy-number changes and structural abnormalities. 
A disadvantage of the mate-pair approach is that quite a large 
quantity of DNA is required for the library preparation, thus limiting 
its use in a significant number of tumors. 



WHOLE-EXOME SEQUENCING 

Whole-exome sequencing (WES) is useful for those interested in 
studying only what lies within the exome (coding genome) and 
untranslated regions. This method is based on an initial 
enrichment step of exonic regions followed by targeted 
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Figure 1. Evolution of genetic detection nnethods and discoveries. Landnnark findings fronn each nnethod are indicated. 



Table 1. Summary of high-throughput sequencing methods 


Method 


Minimum input 
quantity^ 


Strengths 


Weal<nesses 


Whole 
genome 


10ng-1 \xg 
genomic DNA 


Small input DNA requirement 

Variant detection in all regions of the genome 


Lower DNA input may reduce library complexity and 
representation 

PCR duplicates can impact accuracy of variant 
detection software 
Computationally intensive analysis 


Mate pair 


5-10|ig 
genomic DNA 


Identification of large structural rearrangements 


Large input DNA requirement 
High false discovery rate 


Whole 
exome 


1 |ig genomic 
DNA 


Deep coverage of exome enabling precise 
interrogation of coding regions 
Multiple samples can be pooled and run together 
reducing time and cost per sample 


Non-coding regions excluded 

Standard capture kits do not capture all exons 


mRNASeq 


1 00-400 ng 
total RNA^ 


Dynamic range of expression detection can be much 

broader than using microarrays 

Detection of rare and hybrid transcripts 

Precise quantitation of highly expressed transcripts 

and multiple isoforms 

Investigation of 3'UTR and promoters 


RNA fragmentation methods can bias the resulting 
library 

Artifacts from amplified cDNA libraries^ 
Appropriate normal controls may be difficult to obtain 
for tumor/normal comparison 


ChlPSeq 


lOng Chip 
enriched DNA 


Detection of DNA-protein interactions 

Discovery of new interactions in regions not 
represented on microarray chips 
Avoids hybridization problems associated with 
array-based ChIP assays 


Quality of sequencing results dependent on the 
quality of ChIP assay 

Library preparation can introduce GC-rich region bias 


Single 
molecule 


1 [ig genomic 
DNA 


No amplification step resulting in no PCR duplicates 
Long-read length (>1 kb) 


High error rate 

Throughput not comparable to current platforms 


Abbreviations: ChlPSeq, chromatin immunoprecipitation sequencing; UTR, untranslated region. Cost per sample is highly variable depending on the platform 
and on the amount of multiplexing utilized. ^May vary by platform and approach. '^May require polyA RNA- or rRNA-depleted total RNA. 



sequencing. As the exonne represents only 1.4% of the genonne, 
multiple sannples can be pooled and sequenced together in a 
single instrument run. The major weakness of WES is the inability 
of the available enrichment kits to capture the totality of the 
exome. 



Ideally a non-tumoral, reference DNA sample from the individual 
patient is simultaneously analyzed with each tumor. The amount 
of normal variation between individuals is in the order of 
thousands of variants, and performing a paired analysis enables 
subtraction of the nontumor-specific from the tumor-specific 
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variants. In the event that normal tissue is not available for 
comparison, an increased number of publicly available databases, 
such as dbSNP (http://www.ncbi.nlm.nih.gov/projects/SNP/), 
HapMap (http://hapmap.ncbi.nlm.nih.gov/) and 1000 genomes 
(http://www.1000genomes.org/), can be utilized to identify and 
clean previously reported variants in the general population that 
are normal genetic variation rather than somatically acquired 
mutations. 



MESSENGER RNA SEQUENCING 

Besides the detection of mutations, messenger RNA sequencing is 
also a powerful tool for gene expression analysis. The dynamic 
range of expression obtained by messenger RNA sequencing can 
be much broader than that obtained by gene expression 
microarrays, allowing the detection of rare transcripts and more 
precise quantitation of expressed transcripts.^'" In addition, 
messenger RNA sequencing can be utilized for the identification 
of hybrid transcripts and quantitation of multiple isoforms 
resulting from alternative splicing. Major weaknesses include the 
biases in library preparation caused by the RNA fragmentation 
methods utilized, introducing artifacts into the resulting reads. 



NEW APPROACHES AND FRONTIERS 

It is still the subject of debate which approach is the most 
appropriate for studying the cancer genome. WGS is the 
most inclusive approach, but major limitations remain related to 
the high cost and the difficulty associated with managing the data 
storage and intensive computational analysis. The study of the 
exome reduces these limitations. WES is a well-established 
strategy for analyzing coding regions at low cost, making this 
approach the most popular in the analysis of the tumor genome 
nowadays. However, eliminating 98% of the genome from the 
analysis brings the obvious risks associated with omitting crucial 
information. This concern is supported by recent findings 
performed by the Encyclopedia of DNA Elements Project, where 
integrated analysis demonstrated that >80% of the genome is 
biochemically active.^^ This new paradigm will require 
reconsideration of the best strategic approach to optimize the 
cost/benefit ratio in the analysis of the cancer genome. 

Single-molecule, long-read, sequencing approaches are now 
available and allow the simultaneous search for single-allele 
mutations and methylation profiles. Furthermore, several new 
platforms are currently under development that promise to 
sequence the whole genome in few hours for less than $1000. 
These technological advances open a new world of opportunities 
and soon will put the use of WGS within reach of the clinical labs. 



CHALLENGES IN DATA ANALYSIS 

Data generation is just a small facet of the much bigger challenge 
associated with data analysis. The goal of data analysis is to utilize 
bioinformatics tools in a data analysis 'pipeline' (Figure 2) to 
transform the raw data into results that can ultimately be seen in a 
user-friendly visualization tool. Detailed information about the 
most commonly utilized alignment and functional prediction tools 
are beyond the scope of this manuscript, and thus we direct the 
readers' attention to several informative manuscripts.^ ^"^^ 

The ability to quickly generate large quantities of data at 
relatively low cost is limited by these constantly evolving data 
analysis pipelines, failure to report analytical methods with the 
level of detail expected from traditional experimental data and the 
lack of consensus regarding which tools to use when transforming 
the data into a useable form. Nekrutenko and Taylor^'' reviewed 
50 papers that used the Burrows-Wheeler Aligner for mapping 
sequencing reads, and they found that most of the papers neither 
provide access to the raw data nor specify the parameters utilized 




I 



Alignment to human genome 
(BWA) 

http://bio-bwa.sourceforge.net/ 



i 




Post-alignment processing 
(GATK, Picard) 

http://www.broadinstitute.org/gatk/index.php 
http://picard.sourceforge.net/ 

i 

Mutation calling, insertion/deletion 
detection, structural analysis 
(Samtools) 

http://samtools.sourceforge.net/ 




1 



2 



Annotation 
(SeattleSeq) 

http://snp.gs.washington.edu/ 
SeattleSeqAnnotationI 31 



i 




Functional prediction 
(SIFT, Polyphen2) 

http://sift.jcvi.org 
http://genetics.bwh.harvard.edu/pph2/index.shtml 



^^^^^^ 



Figure 2. Schematic of a bioinformatics pipeline. Examples of the 
most commonly used publicly available software programs utilized 
at a particular step are in parentheses. The programs listed were 
the most commonly used in 2012 hematological malignancy 
sequencing analyses. These are only examples and are not intended 
to be an exhaustive list. The number of publicly available tools is 
rapidly expanding and review of these tools is beyond the scope of 
this report. 



or identify the precise version of the genomic reference sequence. 
From the remaining analyses, only four provide settings, eight list 
the version used and seven list all necessary details. Furthermore, 
of 19 sequencing articles that cited and had a similar experimental 
design to that of the 1000 Genomes project, only 4 used the 
workflow recommended. Different analysis pipelines can yield 
similar, yet different results suggesting that more than one 
pipeline may be currently necessary to successfully analyze the 
data and ultimately introduce this technology into the clinic. 



SEQUENCING IN HEMATOLOGICAL MALIGNANCIES 

At this point, where medium-sized cohorts of most of the 
hematological malignancies have been sequenced, it is time to 
wonder what have these data revealed thus far in this group of 
malignancies and what are the opportunities ahead? As we 
expected, genes such as TP53, ATM and RAS among others were 
confirmed as mutated in a wide variety of malignancies. However, 
promising and exciting findings came from the discovery of 
a completely novel group of genes and pathways impaired in 
hematological malignancies. A few malignancies seem to be 
driven by mutations in only one or a few genes, suggesting 
a unique pathway, pathognomonic to the disease. However, most 
of the malignancies show considerable genetic heterogeneity, 
with multiple genes and pathways affected. In this review, we aim 
to summarize the current knowledge of the genetic background 
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Figure 3. Most frequent sonnatic genetic nnutations per hennatological nnalignancy. Only original data fronn nnassively parallel sequencing were 
included, excluding confirnnation data in previously nnutated genes (for exannple, ATM and TP53 in CLL, FLT3 in AML, RAS and TP53 in MM, 
MYD88 in DLBCL, nnutations in the nuclear factor-kB pathways in DLBCL and MM). In cases where data were obtained fronn nnultiple studies, 
the data originating fronn the largest cohort were included. 



on different hematological malignancies and how this knowledge 
could facilitate targeting of dysregulated signaling pathways by 
therapeutic targets. The most recurrent novel somatic genetic 
mutations per malignancy are summarized in Figure 3 and Table 2. 



SINGLE CAUSATIVE MUTATIONS: HAIRY CELL LEUKEMIA AND 
WALDENSTROM'S MACROGLOBULINEMIA AS PARADIGMS 

Probably, the most representative example of a single hit identified 
by sequencing are the hairy cell leukemias (HCL). Initially, WES was 
performed on a single HCL tumor/normal pair with somatic 
mutations identified in five genes: BRAF, C5MD3, SLC5A1, CNTN6 
and ORSJV^ Another 47 HCL cases were screened for BRAF 
mutations and strikingly, the BRAF V600E substitution was found in 
all 47 patients evaluated. Conversely, BRAF mutations were absent 
in related peripheral B-cell lymphomas and chronic lymphocytic 
leukemia (CLL), and were only found in a small subset of multiple 
myeloma (MM) patients (4%).^^ The same activating mutation and 
its damaging effect has been previously reported in solid tumors 
such as in melanoma^° and papillary thyroid cancer.^^ The presence 
of a common mutation across HCL provides a central novel 
therapeutic avenue in HCL based on V600E BRAF inhibitors^^'^^ 
alone or in combination with MEK or ERK inhibitors. The success of 
vemurafenib, a BRAF inhibitor, in the treatment of V600E BRAF- 
mutated melanoma patients led two groups to investigate the 
effectiveness of this small molecule inhibitor in one HCL case study 
each. In both cases, including one with a biallelic V600E BRAF 
mutation, treatment with vemurafenib resulted in successful 
disease treatment,^^'^^ thus providing evidence for clinical trials to 
evaluate the use of BRAF inhibitors in HCL. 

A similar situation was found in Waldenstrom's macroglo- 
bulinemia (WM). Remarkably, a MYD88 L265P-activating mutation 
was recently found in 90% of WM cases.^^ MYD88 encodes for an 
adapter protein that affects the interleukin-1 and toll-like receptor 
pathway, with the L265P mutation leading to the dysregulation of 
the nuclear factor-kB and the JAK-signaling pathways.^'' The same 
mutation has been found, but to a lesser extent, in additional B-cell 
lymphomas, such as diffuse large B-cell lymphomas (DLBCL) of the 



ABC type (-40%), MALT lymphomas and CLL (<10%), supporting 
the key role of MYD88 in the pathogenesis of these neoplasias.^^"^^ 
Interestingly, a recent study evaluating the association between 
MYD88 L265P and clinical characteristics of WM patients reported 
more involvement of the bone marrow disease, higher serum IgM, 
and lower IgA and IgG levels.^° Another group conducting a case- 
control study evaluating the association between MYD88 L265P and 
IgM MGUS patients progressing to WM or other lymphoproliferative 
disorders reported a trend toward progression in patients with the 
presence of the mutation when compared with patients with wild- 
type MYD88.^^ These findings highlight the potential value of 
MYD88 as a potential biomarker of disease progression in WM. 

In CLL, Velsusamy et al?^ recently identified the presence of a 
YPEL5-PPP1CB RNA fusion in 95% of CLL patients screened. 
Interestingly, WGS in the two index cases possessing the chimera 
did not reveal the presence of a gene fusion at the DNA level.^"^ 
These findings emphasize the importance of concurrently utilizing 
multiple methodologies such as WGS and RNASeq when studying 
tumors to better screen for genetic abnormalities. 

One of the recurrent findings of sequencing research efforts has 
been the epistatic nature of discoveries. This notion reinforces the 
thought of classifying disease more along the line of functional 
aberrant pathways, rather than on specific genetic changes. In 
fact, excluding HCL and WM, the majority of the malignancies 
show a considerable genetic heterogeneity, affecting multiple 
genes and pathways. Presented here are some of the most 
remarkable recent discoveries. 



MUTATIONS AFFECTING THE SPLICING MACHINERY 

Recent sequencing studies identified recurrent mutations affecting 
genes of the splicing machinery in myelodysplastic syndrome 
(MDS).^^"^^ Interestingly, six of these genes (SF3AI SF3BI SRSF2, 
U2AF35, ZRSR2 and PRPF40B) affect the initial steps of RNA 
splicing; thus, mutations leading to the impaired recognition of 
the 39 splice site result in the production of abnormal mRNA 
splicing. Mutations of the spliceosome are highly prevalent in MDS 
and other myeloproliferative disorders, ranging from 44% of cases 
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Table 2. Summary of high-throughput sequencing studies performed so far in hematological malignancies 
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Table 2. (Continued) 



Disease 


Discovery 
cohort fN) 


Validation 
cohort fN) 


Method 


Platform 


Mean 
coverage 
depth (X) 


Highlights 


Year 


Reference 


MPN 


40 




WES 


HiSeq 


NR 


SUZ12 (3%) 


2011 


93 


NHL(B-cell) 


2 


263 


mRNASeq 


GAM 


NR 


CIITA translocations (16%) 


2011 


94 


PCNSL 


4 


25 


WES 


GAIIx 


NR 


MYD88 L265P (38%), TBL1XR1 (14%) 


2012 


95 


SMZL 


6 


93 


WGS 


Complete 
genomics 


80 


MLL2 (50%), N0TCH2 (25%) 


2012 


69 


SMZL 


8 


109 


WES 


HiSeq 


111 


N0TCH2 (21%), N0TCH1 (5%), SPEN 
(5%),DTX1 (2%) 


2012 


68 


WM 


30 


54 


WGS 


Complete 
genomics 


66 


MYD88 L265P (91%) 


2012 


26 



Abbreviations: ALL, acute lymphocytic leukemia; AML, acute myelogenous leukemia; BL, Burkitt's lymphoma; ChlPSeq, chromatin immunoprecipitation 
sequencing; CLL, chronic lymphocytic leukemia; DLBCL, diffuse large B-cell lymphoma; ETP ALL, early T-cell precursor acute lymphoblastic leukemia; FL, 
follicular lymphoma; HCL, hairy cell leukemia; MCL, mantle cell lymphoma; MDS, myelodysplastic syndrome; MethylSeq, methylation sequencing; MM, multiple 
myeloma; MPN, myeloproliferative neoplasms; mRNASeq, messenger RNA sequencing; NHL, non-hodgkins lymphoma; NR not reported; PCNSL, primary 
central nervous system lymphoma; SMZL, splenic marginal zone lymphoma; WM, Waldenstroms macroglobulinemia. Platforms: GAM, GAIIx, HiScanSQ and 
HiSeq2000 are from lllumina. Genome Sequencer FLX from 454 Sequencing Roche and SOLID from Life Technologies. ^Indicates significant pathway 
enrichment. 



without increased sideroblasts to 85% of cases with increased 
sideroblasts.^^ The mutations were nnutually exclusive in disease 
subtypes,^^"^^ suggesting a key role of the spliceosonne mutations 
in the pathogenesis of myeloproliferative disorders. On the other 
hand, mutations affecting the spliceosome were significantly 
lower in de novo AML and myeloproliferative neoplasms.^^'^"^ 

SF3B1 was the most commonly mutated of these genes, and it 
was significantly enriched in the group of MDS with increased ring 
sideroblasts, refractory anemia with ring sideroblasts, and 
refractory cytopenia with multilineage dysplasia and ring side- 
roblasts (P< 0.001).^"^ Clinically, SF3B1 mutations were associated 
with fewer cytopenias and longer event-free survival.^^'^^ The high 
prevalence of SF3B1 mutations in diseases with ring sideroblasts 
and the confirmation that the mutation can be identified in 
peripheral blood suggest that SF3B1 could potentially be used as a 
biomarker. 

SF3B1 mutation was also one of the most significant discoveries 
in CLL, found in 10-15% of cases.^^'^^ Mutations in 5F3B1 were 
associated with deletion 11q (P = 0.004).^^ Moreover, SF3B1 
mutations and/or deletion 11q were predictive markers of an 
earlier need for treatment (P< 0.0001).^^ Altogether, these results 
indicate that mutations of the spliceosome are involved in 
hematological malignancies and offer a novel therapeutic 
avenue for MDS and CLL. 



MUTATIONS MODULATING TRANSCRIPTION AND 
TRANSLATION 

One of the most interesting themes arising from the study of 
hematological malignancies is that alterations of genes modula- 
ting transcription and expression are a recurrent finding. 
Sequencing studies in DLBCL and follicular lymphomas reported 
genes involved in the histone modification process.^^'^^'^^'^^ MLL2, 
a histone methyltransferase specific to the H3K4 residue, was the 
most commonly mutated gene in follicular lymphoma, affecting 
almost 90% of cases.^^ The vast majority of mutations had an 
inactivating effect and included missense and frameshift 
mutations affecting or truncating the C-terminal domains, 
including the SET domain.^^'^^ These findings place MLL2 
collectively with the t(14;18)(q32;q21), as the two most common 
abnormalities in follicular lymphoma. 

In addition, genes involved in histone modification were 
collectively identified in -20-40% of DLBCL^^'^^ and early T-cell 
precursor acute lymphoblastic leukemia (ETP ALL).^° Furthermore, 
EZH2, which is involved in histone methylation, is mutated in 



DLBCL and follicular lymphoma,^^'^^ with the mutations occurring 
in a critical SET domain.^^ Preclinical studies in DLBCL have found 
the inhibition of EZH2 an effective therapeutic approach for 
tumors containing activating mutations, thus presenting a novel 
therapeutic target for the treatment of DLBCL."^^ 

Chromatin modifiers are also recurrently affected in MM. MMSET, 
a histone methyltransferase transcriptional repressor, is over- 
expressed in —15% of MM as a consequence of the 
t(4;14)(p16;q32).^^ Sequencing studies show that other chromatin 
modifiers are mutated in a significant subset of MM, including 
KDM6A and H0XA9}^ In addition, in the analysis of MM there was 
an enrichment of mutations within genes involved in protein 
translation. Thus, 42% of MM cases had mutations in this pathway, 
mainly affecting FAM46C (13%), D/53 (11%) and LRRK2 (8%).^^ 

A major finding in MDS and AML was the identification of 
mutations in a set of genes associated with DNA methylation. 
DNMJ3A, a methyltransferase, is the most commonly mutated 
gene in AML found in around 20-30% of AML cases. Interestingly, 
no mutations were found in the related genes DNMTl, DNMT3B or 
DNMT3L^ DNMT3A mutations were associated with poor survival 
(P< 0.001). In addition, mutations have been identified in U2AF1 in 
MDS patients, and those harboring U2AF1 mutations were more 
likely to progress to secondary AML.^^ 



OTHER BREAKTHROUGH DISCOVERIES BY NGS: IDH1 AND 
IDH2 MUTATIONS IN AML 

Another major discovery in AML was the identification of mutations 
in IDHl, which encodes isocitrate dehydrogenase 1, and the related 
IDH2 gene. IDHl mutations have been observed in DLBCL,^^ 
cartilaginous tumors,"^"^ astrocytoma"^^'"^^ and glioblastoma,"^^ 
whereas IDH2 mutations have been reported in astrocytoma.^^ 
Interestingly, patients with grade II astrocytoma who have IDHl 
mutations show significantly shorter progression-free survival than 
tumors with wild-type IDHlf^^ Studies in AML have reported 
mutations in 10-15% of cases, preferentially found in the 
intermediate-risk cytogenetic group, and their association with 
worse prognosis in a subset of AML patients that have been 
confirmed."^^'^^ Mutations in IDHl and IDH2 are mutually exclusive 
and primarily affect IDHl at codon R132 and IDH2 at codons R140 
or R172.^° Mutations in IDHl were enriched in cases possessing 
DNMT3A mutations.^ Conversely, IDH2 mutations are rarely found 
together with other known recurrent mutations.^^'"^^ In addition, 
mutations in IDHl and IDH2 seem to be mutually exclusive with 
TET2 mutations.^ ^'^^ A recent study reported that mutations in IDHl 
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or IDH2 disrupted TET2 function and led to a hypermethylation 
phenotype with impaired hematopoietic differentiation.^^ 

It becomes clear then that the morphological and clinicopatho- 
logical classification of AML is now challenged by these new 
genetic findings. How many subcategories of AML exist? 
How does this heterogeneity exist, or not, in the better defined 
entities at the chromosome level (for example, MS)? In short, the 
various new perspectives to classify AML may ultimately lead 
more toward a molecular and pathway approach, but in some 
cases they might still have very significant resemblance to older 
cytogenetic classification. 

OTHER BREAKTHROUGH DISCOVERIES BY NGS: NOTCH 
MUTATIONS 

Aberrant NOTCH 1 signaling has been identified in both solid and 
hematological tumors, and is a therapeutic target of interest 
currently in preclinical and clinical trials.^^'^^ NOTCHl encodes a 
transcription factor that transduces extracellular signals into 
expression changes in targets genes, including MYC^^ and 
PI3K-AKT signaling pathways.^^ NOTCH receptors are involved in 
cell fate determination, having a critical role in T-cell development. 
In fact, impaired NOTCHl results in a block at the earliest stages of 
T-cell lymphopoiesis.^^ Mutations in NOTCHl lead to an active 
protein isoform lacking the C-terminal domain, and have been 
identified in over 50% of T-cell ALL and, to a lesser extent, in CLL, 
MCL and Burkitt's lymphoma.^^"^^ These mutations mainly target 
the PEST domain, which is required for NOTCHl interaction with 
FBW7, and subsequent NOTCHl targeting for proteosomal 
degradation. 

Data suggest that NOTCH! mutations are a progressive event in 
CLL, increasing in prevalence from newly diagnosed CLL to 
chemorefractory CLL to CLL patients with Ritcher syndrome that 
underwent transformation to DLBCL.^^ NOTCHl mutations were 
associated with trisomy 12 (P = 0.009) and with /G/-/i/-un mutated 
status.^^'^^'^"^ In addition to the association with more advanced 
stages of the disease and with transformation to DLBCL, NOTCHl 
mutations were associated with adverse biological course and 
worse overall survival in CLL (P = 0.03)^^'^^ and MCL (P = 0.003).^° 
In T-cell ALL, NOTCHl mutations were associated with improved 
response to glucocorticoid therapy; however, the association of 
NOTCH activation and clinical outcome seems to be therapy 
dependent.^^"^^ 

On the other hand, recurrent A/0rC/-/2-activating mutations were 
identified in 21-25% splenic marginal zone lymhomas, but only 
rarely in nonsplenic MZLs and other low-grade B-cell lymphomas 
and leukemias.^^'^^ Although these studies evaluated the 
association of NOTCHl mutations and clinical outcomes, 
the findings are conflicting and additional work is necessary to 
clarify the potential clinical impact of mutations in NOTCHl^^'^^ 
Small molecule pan-NOTCH inhibitors have not shown significant 
effects as single agents targeting T-cell ALL, but there is an 
improved antileukemia effect when used in combination with 
inhibitors of PI3K-AKT-mT0R pathway or CDK inhibitors.^°'^^ 



NGS AS A TOOL FOR DISCRIMINATION OF RELATED DISEASES 

Besides the importance of identifying pathognomonic mutations and 
pathways, sequencing is also a powerful tool to differentiate related 
entities. Overall, 67% of ETP ALL had mutations in the RAS signaling 
pathway (BRAF, JAKl, JAK3, KRAS, NRAS) or cytokine receptors (IL7RI 
which was significantly higher than that in non-ETP ALL (19%; 
P< 0.0001)."^° Furthermore, genes involved in hematopoiesis and 
lymphoid development {RUNXl, IKZFl, ETV6, GATA3 and EP300) were 
also more frequently mutated in ETP ALL (58%) than that in non-ETP 
ALL (17%; P< 0.0001). Altogether, 81% of ETP ALL cases have 
mutations in either of these pathways compared with 31% of 
non-ETP ALL cases (P< 0.0001). A similar enrichment was identified in 



genes involved in histone modification (BED, EZH2 and SUZ12), which 
were more commonly mutated in ETP ALL (42%) compared with 
non-ETP ALL (12%; P = 0.0001). 

Mutations in genes affecting the RAS pathway, cytokine receptor 
and epigenetic modification are common in AML, but are rare in 
B- and T-cell neoplasias.'^^'^^ These findings together with previous 
data demonstrate that ETP ALL has a gene expression signature 
closer to leukemic stem cells and granulocyte precursors, suggesting 
that ETP ALL is a distinct entity from non-ETP ALL with a less mature 
phenotype that retains the potential to become a myeloid cell. 



GENOMIC SEQUENCING IN THE ANALYSIS OF CLONAL 
ARCHITECTURE AND CLONAL EVOLUTION 

Genomic sequencing performed in high-coverage depth is a useful 
tool for characterizing the clonal architecture and analyzing the 
clonal evolution in disease progression and in response to therapy. 
Ding et al7^ have provided a good example of the power of 
genomic sequencing in sequential analysis. WGS was performed in 
eight AML cases utilizing normal skin biopsies paired with tumor 
samples collected at diagnosis and after relapse. Candidate somatic 
events were analyzed by deep sequencing with a median of 590X 
coverage. In five out of eight cases, the primary sample was 
characterized by up to four mutation clusters, thus indicating the 
existence of multiple (sub)clones. Two major patterns of clonal 
evolution were identified when comparing primary versus relapse 
samples. Either the original clone in the primary tumor sample 
acquired additional mutations and evolved into the relapse clone, 
or most of the (sub)clones were eradicated by therapy leaving one 
clone. This clone is usually observed at a low frequency in the 
primary sample, it then survives the initial therapy, gains additional 
mutations and expands, becoming the predominant clone at 
relapse. Another interesting finding was obtained by comparing the 
transition with transversion mutation rate between primary and 
relapse samples. The data obtained strongly suggest that the 
chemotherapy regimen used (cytarabine and anthracycline for 
induction and additional cytotoxic chemotherapy for consolidation) 
have had a significant effect in the origin of novel mutations in the 
AML relapse sample. 

In our sequencing analysis performed in MM, we have observed 
the presence of single nucleotide variants waxing and waning over 
the course of several longitudinally collected samples from a single 
patient.^"^ This shift in the presence of single nucleotide variants 
suggests the presence of multiple clones rising and falling in 
dominance over time. In the work by Walker et al.P a single MM 
patient, from whom they had WES data, identified three 
populations containing mutations in four genes: ATM, FSIP2, CLTC 
and GLMN. When they then evaluated which mutations were 
shared in a single cell, they identified one population with only an 
ATM mutation, a second with ATM and FSIP2 and a third with ATM, 
CLTC and GLMN mutations.^^ The observed presence of these 
different clones suggests that if these patients were to be followed 
longitudinally, clonal dominance would likely shift as the tumor 
evolves with time and treatment. 

The study of clonal complexity and clonal evolution is an old 
field that has been reinvigorated since the introduction of NGS. 
We believe NGS will help us to elucidate several unanswered 
questions such as what are the driver-initiating mutations in the 
different hematological malignancies? What are the specific 
mutations associated with disease progression in the different 
hematological malignancies? What mutations are the primary 
contributors to chemoresistance? Does clonal heterogeneity need 
to be considered in the context of determining therapeutic 
options? Do all the clones need to be targeted? 

Some of these questions have been at least partially answered 
in a recent study.^^ The authors analyzed 149 CLL cases, including 
18 that were analyzed at two time points, using WES and SNP 
arrays. Data obtained from this study confirm previous findings 
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showing the existence of linear and multibranching clonal 
evolution in CLL/^'^^ Furthermore, the authors were able to infer 
the order of genetic changes occurring in CLL pathogenesis. Thus, 
it was suggested that the clonal driver mutations, which are 
proposed to be initiating events, mainly affect genes that 
selectively affect B cells, such as MYDSS and del13q, whereas 
subclonal driver mutations associated with disease progression 
affect genes more ubiquitously involved in carcinogenesis such as 
TP53 and ATM. The number of subclonal mutations increases in 
treated compared with untreated cases; thus, the therapy would 
be a trigger for natural selection leading to the emergence of 
more aggressive subclones. Furthermore, the study shows the 
importance of subclonal driver mutations as an independent risk 
factor for rapid disease progression and poor outcome. Thus, this 
study suggests that dissecting the clonal architecture of CLL 
is crucial not only for developing novel risk-stratification 
algorithms but also for designing novel therapeutic approaches, 
considering the presence of driver mutations as well as the 
genomic landscape. 

We expect to see similar efforts in several other hematological 
malignancies, which will ultimately help to elucidate the clonal 
complexity and its importance in each particular disease. 

HOW ARE NEW DISCOVERIES TRANSLATING INTO NOVEL 
THERAPEUTIC APPROACHES? 

Several novel somatic mutations, such as SF3B1, IDHI, IDH2, 
DNMT3A, MYD88 and MLL2 have been identified as a consequence 
of NGS efforts, leading to the discovery of previously unrecog- 
nized genes and molecular processes/pathways with pathogenic 
effects. The genomic profiling of each individual cancer will 
potentially have a key role clinically assisting in early disease 
diagnosis, risk stratification, longitudinal analyses of tumor 
evolution and selection of the most favorable and personalized 
therapeutic intervention. 

One of the most emblematic examples is AML. The un- 
precedented characterization of the AML cancer genome may 
substantially affect the clinical management and the therapeutic 
decisions. The prior characterization of mutations in FLT3, NPMI, 
RUNX1 and CEBPA together with the recent identification of 
mutations in IDHI, IDH2, DNMT3A and TET2 encourage the 
incorporation of genomic studies as part of routine clinical tests 
and may enable optimization of therapeutic plans based on this 
patient-specific genomic background. 

However, the genetic characterization of AML will not improve 
patient survival per se, unless it is synchronized with the 
development of alternative therapeutic approaches. One of 
the major limitations in the treatment of AML is the intrinsic drug 
resistance of the tumor cells. Standard induction chemotherapy 
regimens, consisting of cytarabine and anthracycline combinations, 
have remained largely unchanged in the treatment of AML over 
decades. Thus, the major challenge is to provide the AML patients 
with alternative drug combinations targeting novel genes/path- 
ways discovered in chemoresistant cases. 

The discovery of novel genes/pathways not only increases our 
understanding of the pathogenesis of the disease but also opens 
new therapeutic avenues. The existence of potential 'Achilles' 
heels to be exploited for generating a unifying targeted therapy 
for all patients is very provocative and opens an exciting era for 
translational research. Exploiting this knowledge is critical in 
hematological malignancies when considering that most of them 
are still incurable and more effective therapies are urgently 
needed. 

So far, we have discovered a different range of genetic 
heterogeneity across tumor types. We have learnt that some 
malignancies have a mutated gene or pathway that affect most or 
all cases. An excellent example is provided by the BRAF V600E, 
common to all HCL patients,^^ or MYD88 L265P, found in most 



y^iyj 26,28,79 pq|. example, V600E BRAF can be targeted with BRAF 
inhibitors^^'^^ alone or in combination with MEK or ERK inhibitors. 

Conversely, the majority of hematological malignancies are 
characterized by considerable tumor heterogeneity, making the 
search for therapeutic targets more difficult. One of the biggest 
challenges is to reduce the complexity of the generated data by 
first, distinguishing the driver over the passenger mutations and 
clones, and second, generating systematic and more sophisticated 
approaches for data analysis integration, thus unifying the vast 
genomic heterogeneity of these cancers into more homogeneous 
groupings based on cellular pathways rather than on single genes. 
As in the case of single gene mutations, the disruption of specific 
pathways may be exploited therapeutically. 

With the dawn of the $1000 genome drawing close, we 
anticipate an ever-increasing role of genomic sequencing in the 
diagnosis and treatment of patients. As this technology moves ever 
closer to widespread clinical application, there are several 
challenges that must be addressed. First, the management of the 
data obtained from sequencing must be addressed. Not only does 
the physical storage of the data present a challenge but how the 
information obtained is reported to the patient, retained over time 
and/or destroyed, are important issues that also must be discussed. 
Second, incidental findings of mutations in genes unrelated to the 
medical reason, a patient is seeking genome sequencing can result 
in legal and ethical dilemmas for the care providers. Furthermore, 
knowledge about genomics and disease is rapidly expanding, thus 
consideration of whether a patient's genome should be re- 
evaluated at a later time must be considered. 

As our focus shifts from large population-based studies with large 
cohorts to the 'N of one' with individualized genomic medicine, it is 
imperative that the recommendations made to patients be based 
on evidence from well-designed functional studies. Translating this 
genetic data into the clinic is challenging and a significant amount 
of functional work is still required to better understand the 
biological significance of these hits using both in vitro and in vivo 
models. In the near future, we anticipate a standard of care for 
personalized medicine that involves sending samples for sequencing 
at the time of biopsy. A variant report will be generated for the 
physician who will then base treatment decisions on the findings 
from sequencing in addition to pathology and clinical diagnostics. 

The ultimate goal in the post-genomic era will be to extend to 
other hematological malignancies the successful transition from 
gene discovery to therapeutic intervention observed in the 
paradigmatic BCR-ABL CML cases treated with imatinib. 
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